Inducing Probabilistic Syllable Classes Using Multivariate Clustering -gold

نویسندگان

  • Karin Müller
  • Bernd Möbius
  • Detlef Prescher
چکیده

An approach to automatic detection of syllable structure is presented. We demonstrate a novel application of EM-based clustering to multivariate data, exempliied by the induction of 3-and 5-dimensional probabilistic syllable classes. The 3-dimensional models were subjected to a pseudo-disambiguation task, the result of which shows that the onset is the most variable, or least predictable, part of the syllable. An extensive qualitative evaluation shows that the method yields phonologically meaningful syllable classes. We then propose a novel approach to grapheme-to-phoneme conversion and show that syllable structure represents valuable information for pronunciation systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inducing Probabilistic Syllable Classes Using Multivariate Clustering

An approach to automatic detection of syllable structure is presented. We demonstrate a novel application of EM-based clustering to multivariate data, exempli ed by the induction of 3and 5-dimensional probabilistic syllable classes. The qualitative evaluation shows that the method yields phonologically meaningful syllable classes. We then propose a novel approach to grapheme-to-phoneme conversi...

متن کامل

A Step-wise Usage-based Method for Inducing Polysemy-aware Verb Classes

We present an unsupervised method for inducing verb classes from verb uses in gigaword corpora. Our method consists of two clustering steps: verb-specific semantic frames are first induced by clustering verb uses in a corpus and then verb classes are induced by clustering these frames. By taking this step-wise approach, we can not only generate verb classes based on a massive amount of verb use...

متن کامل

Probabilistic Landslide Risk Analysis and Mapping (Case Study: Chehel-Chai Watershed, Golestan Province, Iran)

The efficiency of three statistical models, AHP surface-weighted density bivariate (semi-quantitative models), stepwise multivariate regression and logistic multivariate regression models were compared in Chehel-Chai watershed in Golestan province, Iran. In current study the hazard map was prepared according to the top model of landslide hazard map. Chehel-Chai watershed is located as one of Go...

متن کامل

Spectral Clustering for German Verbs

We describe and evaluate the application of a spectral clustering technique (Ng et al., 2002) to the unsupervised clustering of German verbs. Our previous work has shown that standard clustering techniques succeed in inducing Levinstyle semantic classes from verb subcategorisation information. But clustering in the very high dimensional spaces that we use is fraught with technical and conceptua...

متن کامل

Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information

The paper describes the application of kMeans, a standard clustering technique, to the task of inducing semantic classes for German verbs. Using probability distributions over verb subcategorisation frames, we obtained an intuitively plausible clustering of 57 verbs into 14 classes. The automatic clustering was evaluated against independently motivated, handconstructed semantic verb classes. A ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000